ReplayBuffer
- class cpprb.ReplayBuffer(size, env_dict=None, next_of=None, *, stack_compress=None, default_dtype=None, Nstep=None, mmap_prefix=None, **kwargs)
Bases:
object
Replay Buffer class to store transitions and to sample them randomly.
The transition can contain anything compatible with NumPy data type. User can specify by
env_dict
parameters at constructor freely.The possible standard transition contains observation (
obs
), action (act
), reward (rew
), the next observation (next_obs
), and done (done
).>>> env_dict = {"obs": {"shape": (4,4)}, ... "act": {"shape": 3, "dtype": np.int16}, ... "rew": {}, ... "next_obs": {"shape": (4,4)}, ... "done": {}}
In this class, sampling is random sampling and the same transition can be chosen multiple times.
Initialize
ReplayBuffer
- Parameters
size (int) – Buffer size
env_dict (dict of dict, optional) – Dictionary specifying environments. The keys of
env_dict
become environment names. The values ofenv_dict
, which are alsodict
, defines"shape"
(default1
) and"dtypes"
(fallback todefault_dtype
)next_of (str or array like of str, optional) – Value names whose next items share memory region. The
"next_"
prefixed items (eg.next_obs
forobs
) are automatically added toenv_dict
without duplicated memory.stack_compress (str or array like of str, optional) – Value names whose duplicated stack dimension is compressed. The values must have stacked dimension at the last dimension.
default_dtype (numpy.dtype, optional) – Fallback dtype. The default value is
numpy.single
Nstep (dict, optional) – If this option is specified, Nstep reward is used.
Nstep["size"]
isint
specifying step size of Nstep reward.Nstep["rew"]
isstr
or array like ofstr
specifying Nstep reward to be summed.Nstep["gamma"]
is float specifying discount factor, its default is0.99
.Nstep["next"]
isstr
or list ofstr
specifying next values to be moved. When this option is enabled,"done"
is required atenv_dict
.mmap_prefix (str, optional) – File name prefix to map buffer data using mmap. If
None
(default), stores only on memory. This feature is designed for very large data which cannot be located on physical memory.
Examples
Create simple replay buffer with buffer size of \(10^6\).
>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {}, "rew": {}, ... "next_obs": {"shape": 3}, "done": {}})
Create replay buffer with
np.float64
, but only"act"
isnp.int8
.>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {"dtype": np.int8}, ... "rew": {}, ... "next_obs": {"shape": 3}, "done": {}}, ... default_dtype = np.float64)
Create replay buffer with
next_of
memory compression for"obs"
. In this example,"next_obs"
is automatically added and shares the memory with"obs"
.>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {}, "rew": {}, "done": {}}, ... next_of="obs")
Create replay buffer with
stack_compress
memory compression for"obs"
. The stacked data must be a sliding window of a sequential data, and the last dimension is the stack dimension.>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": (3,2)}}, ... stack_compress="obs") >>> rb.add(obs=[[1,2], ... [1,2], ... [1,2]]) 0 >>> rb.add(obs=[[2,3], ... [2,3], ... [2,3]]) 1
Create very large replay buffer mapping on disk.
>>> rb = ReplayBuffer(1e+9, {"obs": "shape": 3}, mmap_prefix="rb_data")
Methods Summary
add
(self, **kwargs)Add transition(s) into replay buffer.
clear
(self)Clear replay buffer.
get_all_transitions
(self, bool shuffle)Get all transitions stored in replay buffer.
get_buffer_size
(self)Get buffer size
get_current_episode_len
(self)Get current episode length
get_next_index
(self)Get the next index to store
get_stored_size
(self)Get stored size
is_Nstep
(self)Get whether use Nstep or not
load_transitions
(self, file)Load transitions from file
on_episode_end
(self)Call on episode end
sample
(self, batch_size)Sample the stored transitions randomly with specified size
save_transitions
(self, file, *[, safe])Save transitions to file
Methods Documentation
- add(self, **kwargs)
Add transition(s) into replay buffer.
Multple sets of transitions can be added simultaneously.
- Parameters
**kwargs (array like or float or int) – Transitions to be stored.
- Returns
The first index of stored position. If all transitions are stored into
NstepBuffer
and no transtions are stored into the main buffer,None
is returned.- Return type
int or None
- Raises
KeyError – If any values defined at constructor are missing.
Warning
All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.
Examples
>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}})
Add a single transition:
[1,2,3]
.>>> rb.add(obs=[1,2,3])
Add three step sequential transitions:
[1,2,3]
,[4,5,6]
, and[7,8,9]
simultaneously.>>> rb.add(obs=[[1,2,3], ... [4,5,6], ... [7,8,9]])
- clear(self) void
Clear replay buffer.
Set
index
andstored_size
to0
.Example
>>> rb = ReplayBuffer(5,{"done",{}}) >>> rb.add(1) >>> rb.get_stored_size() 1 >>> rb.get_next_index() 1 >>> rb.clear() >>> rb.get_stored_size() 0 >>> rb.get_next_index() 0
- get_all_transitions(self, bool shuffle: bool = False)
Get all transitions stored in replay buffer.
- Parameters
shuffle (bool, optional) – When
True
, transitions are shuffled. The default value isFalse
.- Returns
transitions – All transitions stored in this replay buffer.
- Return type
dict of numpy.ndarray
- get_buffer_size(self) size_t
Get buffer size
- Returns
buffer size
- Return type
size_t
- get_current_episode_len(self) size_t
Get current episode length
- Returns
Current episode length
- Return type
size_t
- get_next_index(self) size_t
Get the next index to store
- Returns
the next index to store
- Return type
size_t
- get_stored_size(self) size_t
Get stored size
- Returns
stored size
- Return type
size_t
- is_Nstep(self) bool
Get whether use Nstep or not
- Returns
Whether Nstep is used
- Return type
bool
- load_transitions(self, file)
Load transitions from file
- Parameters
file (str or file-like object) – File to read data
- Raises
ValueError – When file format is wrong.
Warning
In order to avoid security vulnerability, you must not load untrusted file, since this method is based on
pickle
.
- on_episode_end(self) void
Call on episode end
Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.
Notes
Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any
done
flags from environment is not set.
- sample(self, batch_size)
Sample the stored transitions randomly with specified size
- Parameters
batch_size (int) – sampled batch size
- Returns
sample – Sampled batch transitions, which might contains the same transition multiple times.
- Return type
dict of ndarray
Examples
>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}}) >>> rb.add(obs=[1,2,3]) >>> rb.add(obs=[[1,2,3],[1,2,3]]) >>> rb.sample(4) {'obs': array([[1., 2., 3.], [1., 2., 3.], [1., 2., 3.], [1., 2., 3.]], dtype=float32)}
- save_transitions(self, file, *, safe=True)
Save transitions to file
- Parameters
file (str or file-like object) – File to write data
safe (bool, optional) – If
False
, we try more aggressive compression which might encounter future incompatibility.